Quality-Configurable Memory Hierarchy Through Approximation
نویسندگان
چکیده
The memory subsystem is a major contributor to the overall performance and energy consumption of embedded computing platforms. The emergence of "killer" applications such as data-intensive recognition, mining, and synthesis (RMS) applications puts even more stress on the memory subsystem and exacerbates its energy consumption. Traditional mechanisms to ensure data integrity deploy overdesign (e.g., redundancy and error detection/correction) and/or guardbanding that consumes a signi cant part of the energy consumed in the memory subsystem. We explore opportunities for energy e ciency by exploiting the intrinsic tolerance of a vast class of approximate computing applications to some level of error in the on-chip memory hierarchy. We present two exemplars outlining the typical software and hardware mechanisms that are required for di erent components in the memory hierarchy, implemented in varying technologies such as SRAM and STT-MRAM.
منابع مشابه
Ultra-Efficient Content Addressable Memory for Tunable GPU Approximation
In this paper, we describe a resistive configurable associative memory (ReCAM) that enables selective approximation and asymmetric voltage overscaling to manage delivered efficiency. The ReCAM structure matches an input pattern with pre-stored ones by applying an approximate search on selected bit indices (bitline-configurable). To further reduce energy, we explore proper ReCAM sizing, various ...
متن کاملUse of an Embedded Configurable Memory for Stream Image Processing
We examine the use of the embedded Blackfin BF561 processor for high-definition image processing using the stream model of computing. The Blackfin features a configurable memory hierarchy that minimizes the Memory Wall effect. We describe the stream model and its application to the BF561 to utilize low-latency on-chip memory and compare to a worst-cast baseline using SDRAM only. We find a 2X to...
متن کاملSimulation and Architectural Exploration of a Shared - Memory Multiprocessor Node for Scientific Algorithms
In this thesis, GEMS (a Generic Environment for Multiprocessor Simulations) is presented. GEMS is a simulation environment written in the Superlog language, which simulates a configurable shared-memory multiprocessor system. Simulation focuses on the memory hierarchy and the system interconnect. Part of GEMS is a directory-based cache coherence protocol. This protocol is an adaptation of the bu...
متن کاملChapter 6 TUNING CACHES TO APPLICATIONS FOR LOW - ENERGY EMBEDDED SYSTEMS
The power consumed by the memory hierarchy of a microprocessor can contribute to as much as 50% of the total microprocessor system power, and is thus a good candidate for power and energy optimizations. We discuss four methods for tuning a microprocessors’ cache subsystem to the needs of any executing application for low-energy embedded systems. We introduce onchip hardware implementing an effi...
متن کاملAccelerating Blocked Matrix-Matrix Multiplication using a Software-Managed Memory Hierarchy with DMA
The optimization of matrix-matrix multiplication (MMM) performance has been well studied on general-purpose desktop and server processors. Classic solutions exploit common microarchitectural features including superscalar execution and the cache and TLB hierarchy to achieve near-peak performance. Typical digital signal processors (DSPs) do not have these features, and instead use in-order execu...
متن کامل